Automatic Frame Length, Frame Overlap and Hidden Markov Model Topology for Speech Recognition of Animal Vocalizations
نویسنده
چکیده
Preface Automatic Speech Recognition (ASR) is a useful tool that can facilitate the research and study of animal vocalizations. The use of human speech-based signal processing techniques for animal vocalizations has several pitfalls. Animal vocalizations may not share the same spectral or temporal characteristics as human speech. As a result , the typical ASR assumptions concerning the best frame length, frame overlap and HMM topology may not be suitable for various animal vocalizations. This paper proposes a technique for estimating the frame length, frame overlap and HMM topol-ogy from a single, clean, example animal vocalization. Multiple trials are run using the proposed technique, against the vocalizations of two distinct animal species: the Norwegian Ortolan Bunting (Emberiza Hortulana) and the African Elephant (Lox-odonta Africana). The results are examined, and the technique provides reasonable estimates for the frame length, the frame overlap and the HMM topology, given the quality of the example vocalizations. Specific recommendations are made for the continuation of this research into a usable tool for animal researches. ii Acknowledgments I thank all of the people that made this work possible; including, but not limited to, the following people: To my wife, Denise, for her love and support and for graciously accepting my absence from our living room every evening for the last year. To my children, Logan and Zoe, for gracing our lives. To my parents, James and Kathleen, and to the Holy Trinity, for providing me with the gifts that make my life's work possible. Finally, I thank Sun Tzu, for teaching me how to live purposefully: " Withdraw like a mountain in movement, advance like a rainstorm. Strike and crush with shattering force; go into battle like a tiger. " [1] iii iv I dedicate this work to my wife Denise, my son Logan and my daughter Zoe, whom are living examples of courage, unbounded energy and enthusiasm, respectively.
منابع مشابه
The Use of Adaptive Frame for Speech Recognition
We propose an adaptive frame speech analysis scheme through dividing speech signal into stationary and dynamic region. Long frame analysis is used for stationary speech, and short frame analysis for dynamic speech. For computation convenience, the feature vector of short frame is designed to be identical to that of long frame. Two expressions are derived to represent the feature vector of short...
متن کاملAutomatic Type Classification and Speaker Identification of African Elephant Vocalizations
This paper presents systems for automatically classifying elephant vocalizations by type and for identifying the speaker of a given vocalization. The method applies techniques from the speech processing field, with modifications, to elephant vocalizations. The features used for classification are 12 Mel-Frequency Cepstral Coefficients computed using a chirp Z-transform to interpolate among the ...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملLecture 7: Gmtk 2 Dbns for Automatic Speech Recognition
A “dynamic Bayesian network” (DBN) is a network that can dynamically change its size. In a hidden Markov model, for example, in order to recognize an audio file of T frames, we need to compute the values of T hidden state variables, q1, . . . , qT . The state variables q2 and q3 are usually structurally identical; they depend on similar context variables, only shifted by one frame. Structures o...
متن کاملFrame-dependent multi-stream reliability indicators for audio-visual speech recognition
We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speechclass conditional observation probabilities of appropriate audioor visual-only ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006